Latent Diffusion Bridges for Unsupervised Timbre Transfer Demo¶

This demo page is for the paper Latent Diffusion Bridges for Unsupervised Timbre Transfer

Source code: link

Table of Contents¶

  1. Timbre Transfer Results
    1.1. Normal Instruments Created with Our Method
    1.2. Pitch-Shifted Flute
    1.3. Chunk-Based Minibatch

  2. Impact of Different Sigma Max and Sigma N

  3. Shared Space

  4. Cycle Consistency

Timbre Transfer Results¶

Normal Instruments Created with Our Method¶

Source Target
flute
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.07, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
trumpet
DPD: 0.05, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.1, JD: 0.2
Your browser does not support the audio element.
No description has been provided for this image
trumpet
DPD: 0.13, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image
trumpet
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.02, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.02, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
bassoon
Your browser does not support the audio element.
No description has been provided for this image
cello
DPD: 0.12, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
cello
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.07, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image

Pitch-Shifted¶

Source Target
flute shifted -20 semitones
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.21, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
flute shifted -25 semitones
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.6, JD: 0.25
Your browser does not support the audio element.
No description has been provided for this image

Chunk-Based Minibatch¶

Source Target
flute
model trained with time chunk size 4 and channel chunk size 0
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.12, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
flute
model trained with time chunk size 4 and channel chunk size 32
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.2, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
model trained with time chunk size 4 and channel chunk size 0
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.09, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
model trained with time chunk size 4 and channel chunk size 32
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.13, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image

Impact of Different Sigma Max and Sigma N¶

Source Noise Target
violin
model with sigma_max=100 and sigma_N=100
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Noisy violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 2.39, JD: 0.64
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Source Noise Target
violin
model with sigma_max=100 and sigma_N=5
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Noisy violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.12, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image

Shared Space¶

The following audio samples were generated using flute and violin models, both with sigma_max=100 and sigma_N=100, by sampling directly from N(0, sigma_max). Below, we provide examples of audio pairs that were considered melodically similar and those that were not.

Flute Violin
Similar Melodies (DPD < 0.7)
Your browser does not support the audio element.
No description has been provided for this image
DPD: 0.52, JD: 0.18
Your browser does not support the audio element.
No description has been provided for this image
Different Melodies (DPD >= 0.7)
Your browser does not support the audio element.
No description has been provided for this image
DPD: 1.77, JD: 0.25
Your browser does not support the audio element.
No description has been provided for this image

Cycle Consistency¶

The following results were obtained by calculating the normalized L2 norm between the input Encodec embeddings derived from flute audio and the generated Encodec embeddings after converting the flute to violin and back to flute.

No description has been provided for this image No description has been provided for this image